{ "cells": [ { "cell_type": "markdown", "id": "7a2d36dd", "metadata": {}, "source": [ "---\n", "title: Data extraction\n", "description: We proceed to extract dataset using WorldFootballR library from Fbref and ...\n", "---" ] }, { "cell_type": "markdown", "id": "4ed56c64", "metadata": {}, "source": [ "Nous collectons les données de Fbref et Transfermarkt en utilisant la bibliothèque WorldFootballR. \n", "\n", "Nous collectons des données de 2015 à 2023 auprès des principales ligues européennes de première division : Angleterre, Espagne, Italie, Allemagne, France, Portugal, Écosse, Pologne, Grèce, Turquie, Suisse, Pays-Bas, Belgique, Autriche." ] }, { "cell_type": "code", "execution_count": 1, "id": "b07eaee9", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Le chargement a n'ecessit'e le package : worldfootballR\n", "\n", "Le chargement a n'ecessit'e le package : readr\n", "\n" ] } ], "source": [ "if (!require(worldfootballR)) { \n", " install.packages(\"worldfootballR\")\n", " library(worldfootballR)\n", "}\n", "\n", "if (!require(readr)) {\n", " install.packages(\"readr\")\n", " library(readr)\n", "}" ] }, { "cell_type": "markdown", "id": "19a44114", "metadata": { "vscode": { "languageId": "r" } }, "source": [ "### Collecting match results" ] }, { "cell_type": "code", "execution_count": 22, "id": "db3124e1", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# Change parameter to study different teams and seaons \n", "# country <- c(\"ENG\", \"ESP\", \"ITA\", \"GER\", \"FRA\", \"POR\", \"SCO\", \"POL\", \"GRE\", \"SUI\", \"NED\", \"BEL\", \"AUT\")\n", "# year <- c(2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023)\n", "\n", "country <- c(\"ENG\", \"ESP\", \"ITA\", \"GER\", \"FRA\")\n", "year <- c(2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023)\n", "match_result <- fb_match_results(country = country, gender = \"M\", season_end_year = year, tier = \"1st\")" ] }, { "cell_type": "code", "execution_count": 23, "id": "6082da76", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 x 8
LeagueCountrySeasonDateHomeHomeGoalsAwayAwayGoals
<chr><chr><int><date><chr><dbl><chr><dbl>
1Premier LeagueENG20152014-08-16Manchester Utd1Swansea City2
2Premier LeagueENG20152014-08-16Stoke City 0Aston Villa 1
3Premier LeagueENG20152014-08-16Leicester City2Everton 2
4Premier LeagueENG20152014-08-16QPR 0Hull City 1
5Premier LeagueENG20152014-08-16West Ham 0Tottenham 1
6Premier LeagueENG20152014-08-16West Brom 2Sunderland 2
\n" ], "text/latex": [ "A data.frame: 6 x 8\n", "\\begin{tabular}{r|llllllll}\n", " & League & Country & Season & Date & Home & HomeGoals & Away & AwayGoals\\\\\n", " & & & & & & & & \\\\\n", "\\hline\n", "\t1 & Premier League & ENG & 2015 & 2014-08-16 & Manchester Utd & 1 & Swansea City & 2\\\\\n", "\t2 & Premier League & ENG & 2015 & 2014-08-16 & Stoke City & 0 & Aston Villa & 1\\\\\n", "\t3 & Premier League & ENG & 2015 & 2014-08-16 & Leicester City & 2 & Everton & 2\\\\\n", "\t4 & Premier League & ENG & 2015 & 2014-08-16 & QPR & 0 & Hull City & 1\\\\\n", "\t5 & Premier League & ENG & 2015 & 2014-08-16 & West Ham & 0 & Tottenham & 1\\\\\n", "\t6 & Premier League & ENG & 2015 & 2014-08-16 & West Brom & 2 & Sunderland & 2\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 x 8\n", "\n", "| | League <chr> | Country <chr> | Season <int> | Date <date> | Home <chr> | HomeGoals <dbl> | Away <chr> | AwayGoals <dbl> |\n", "|---|---|---|---|---|---|---|---|---|\n", "| 1 | Premier League | ENG | 2015 | 2014-08-16 | Manchester Utd | 1 | Swansea City | 2 |\n", "| 2 | Premier League | ENG | 2015 | 2014-08-16 | Stoke City | 0 | Aston Villa | 1 |\n", "| 3 | Premier League | ENG | 2015 | 2014-08-16 | Leicester City | 2 | Everton | 2 |\n", "| 4 | Premier League | ENG | 2015 | 2014-08-16 | QPR | 0 | Hull City | 1 |\n", "| 5 | Premier League | ENG | 2015 | 2014-08-16 | West Ham | 0 | Tottenham | 1 |\n", "| 6 | Premier League | ENG | 2015 | 2014-08-16 | West Brom | 2 | Sunderland | 2 |\n", "\n" ], "text/plain": [ " League Country Season Date Home HomeGoals\n", "1 Premier League ENG 2015 2014-08-16 Manchester Utd 1 \n", "2 Premier League ENG 2015 2014-08-16 Stoke City 0 \n", "3 Premier League ENG 2015 2014-08-16 Leicester City 2 \n", "4 Premier League ENG 2015 2014-08-16 QPR 0 \n", "5 Premier League ENG 2015 2014-08-16 West Ham 0 \n", "6 Premier League ENG 2015 2014-08-16 West Brom 2 \n", " Away AwayGoals\n", "1 Swansea City 2 \n", "2 Aston Villa 1 \n", "3 Everton 2 \n", "4 Hull City 1 \n", "5 Tottenham 1 \n", "6 Sunderland 2 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "columns_to_keep <- c('Competition_Name', 'Country', 'Season_End_Year', 'Date', 'Home', 'HomeGoals', 'Away', 'AwayGoals')\n", "match_result <- match_result[, columns_to_keep]\n", "# Rename columns\n", "colnames(match_result) <- c('League', 'Country', 'Season', 'Date', 'Home', 'HomeGoals', 'Away', 'AwayGoals')\n", "head(match_result)" ] }, { "cell_type": "code", "execution_count": 24, "id": "716d6226", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 'Premier League'
  2. 'La Liga'
  3. 'Ligue 1'
  4. 'Fu\\303\\237ball-Bundesliga'
  5. 'Serie A'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'Premier League'\n", "\\item 'La Liga'\n", "\\item 'Ligue 1'\n", "\\item 'Fu\\textbackslash{}303\\textbackslash{}237ball-Bundesliga'\n", "\\item 'Serie A'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'Premier League'\n", "2. 'La Liga'\n", "3. 'Ligue 1'\n", "4. 'Fu\\303\\237ball-Bundesliga'\n", "5. 'Serie A'\n", "\n", "\n" ], "text/plain": [ "[1] \"Premier League\" \"La Liga\" \n", "[3] \"Ligue 1\" \"Fu\\303\\237ball-Bundesliga\"\n", "[5] \"Serie A\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "unique(match_result$League)\n", "# Fix League name for Bundesliga\n", "match_result$League <- gsub(\"Fu\\303\\237ball-Bundesliga\", \"Bundesliga\", match_result$League)" ] }, { "cell_type": "code", "execution_count": 25, "id": "d48409eb", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ " League Country Season Date \n", " Length:16463 Length:16463 Min. :2015 Min. :2014-08-08 \n", " Class :character Class :character 1st Qu.:2017 1st Qu.:2016-10-29 \n", " Mode :character Mode :character Median :2019 Median :2019-01-11 \n", " Mean :2019 Mean :2019-01-10 \n", " 3rd Qu.:2021 3rd Qu.:2021-03-21 \n", " Max. :2023 Max. :2023-06-11 \n", " \n", " Home HomeGoals Away AwayGoals \n", " Length:16463 Min. : 0.000 Length:16463 Min. :0.000 \n", " Class :character 1st Qu.: 1.000 Class :character 1st Qu.:0.000 \n", " Mode :character Median : 1.000 Mode :character Median :1.000 \n", " Mean : 1.536 Mean :1.213 \n", " 3rd Qu.: 2.000 3rd Qu.:2.000 \n", " Max. :10.000 Max. :9.000 \n", " NA's :101 NA's :101 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "summary(match_result)" ] }, { "cell_type": "code", "execution_count": 26, "id": "60b04e68", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# Saving the data\n", "write_csv(match_result, \"data/extracted_match_results.csv\")" ] }, { "cell_type": "markdown", "id": "0dd1d968", "metadata": {}, "source": [ "### Collecting head coach data" ] }, { "cell_type": "code", "execution_count": 2, "id": "28e076f6", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# Change country to study different teams\n", "# country <- c(\"England\", \"Spain\", \"Italy\", \"Germany\", \"France\", \"Portugal\", \"Scotland\", \"Poland\", \"Greece\", \"Switzerland\", \"Netherlands\", \"Belgium\")\n", "\n", "country <- c(\"England\", \"Spain\", \"Italy\", \"Germany\", \"France\")\n", "teams_url <- c()\n", "\n", "for (i in seq_along(country)) {\n", " league_team_url <- tm_league_team_urls(country_name = country[i], start_year = 2015)\n", " teams_url <- c(teams_url, league_team_url)\n", "}" ] }, { "cell_type": "code", "execution_count": 3, "id": "d8ce18f9", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "\n", "
  1. 'Premier League'
  2. 'Championship'
  3. 'LaLiga'
  4. 'Primera Federación - Grupo II'
  5. 'LaLiga2'
  6. 'Primera Federación - Grupo I'
  7. 'Serie A'
  8. 'Serie B'
  9. 'Serie D - Girone D'
  10. NA
  11. 'Bundesliga'
  12. '2. Bundesliga'
  13. '3. Liga'
  14. 'Ligue 1'
  15. 'Ligue 2'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'Premier League'\n", "\\item 'Championship'\n", "\\item 'LaLiga'\n", "\\item 'Primera Federación - Grupo II'\n", "\\item 'LaLiga2'\n", "\\item 'Primera Federación - Grupo I'\n", "\\item 'Serie A'\n", "\\item 'Serie B'\n", "\\item 'Serie D - Girone D'\n", "\\item NA\n", "\\item 'Bundesliga'\n", "\\item '2. Bundesliga'\n", "\\item '3. Liga'\n", "\\item 'Ligue 1'\n", "\\item 'Ligue 2'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'Premier League'\n", "2. 'Championship'\n", "3. 'LaLiga'\n", "4. 'Primera Federación - Grupo II'\n", "5. 'LaLiga2'\n", "6. 'Primera Federación - Grupo I'\n", "7. 'Serie A'\n", "8. 'Serie B'\n", "9. 'Serie D - Girone D'\n", "10. NA\n", "11. 'Bundesliga'\n", "12. '2. Bundesliga'\n", "13. '3. Liga'\n", "14. 'Ligue 1'\n", "15. 'Ligue 2'\n", "\n", "\n" ], "text/plain": [ " [1] \"Premier League\" \"Championship\" \n", " [3] \"LaLiga\" \"Primera Federacin - Grupo II\"\n", " [5] \"LaLiga2\" \"Primera Federacin - Grupo I\"\n", " [7] \"Serie A\" \"Serie B\" \n", " [9] \"Serie D - Girone D\" NA \n", "[11] \"Bundesliga\" \"2. Bundesliga\" \n", "[13] \"3. Liga\" \"Ligue 1\" \n", "[15] \"Ligue 2\" " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
  1. 'Chelsea FC'
  2. 'Manchester City'
  3. 'Arsenal FC'
  4. 'Manchester United'
  5. 'Liverpool FC'
  6. 'Tottenham Hotspur'
  7. 'Everton FC'
  8. 'Southampton FC'
  9. 'West Ham United'
  10. 'Newcastle United'
  11. 'Leicester City'
  12. 'Stoke City'
  13. 'Swansea City'
  14. 'Watford FC'
  15. 'Crystal Palace'
  16. 'Aston Villa'
  17. 'Norwich City'
  18. 'West Bromwich Albion'
  19. 'Sunderland AFC'
  20. 'AFC Bournemouth'
  21. 'FC Barcelona'
  22. 'Real Madrid'
  23. 'Atlético de Madrid'
  24. 'Valencia CF'
  25. 'Sevilla FC'
  26. 'Athletic Bilbao'
  27. 'Villarreal CF'
  28. 'Real Sociedad'
  29. 'Celta de Vigo'
  30. 'Málaga CF'
  31. 'Granada CF'
  32. 'RCD Espanyol Barcelona'
  33. 'Deportivo de La Coruña'
  34. 'SD Eibar'
  35. 'Real Betis Balompié'
  36. 'Sporting Gijón'
  37. 'Levante UD'
  38. 'Rayo Vallecano'
  39. 'Getafe CF'
  40. 'UD Las Palmas'
  41. 'Juventus FC'
  42. 'AS Roma'
  43. 'SSC Napoli'
  44. 'Inter Milan'
  45. 'AC Milan'
  46. 'ACF Fiorentina'
  47. 'SS Lazio'
  48. 'UC Sampdoria'
  49. 'Genoa CFC'
  50. 'US Sassuolo'
  51. 'Udinese Calcio'
  52. 'Torino FC'
  53. 'Atalanta BC'
  54. 'Bologna FC 1909'
  55. 'FC Empoli'
  56. 'Palermo FC'
  57. 'AC Carpi'
  58. 'Chievo Verona'
  59. 'Hellas Verona'
  60. 'Frosinone Calcio'
  61. 'Bayern Munich'
  62. 'Borussia Dortmund'
  63. 'VfL Wolfsburg'
  64. 'Bayer 04 Leverkusen'
  65. 'FC Schalke 04'
  66. 'Borussia Mönchengladbach'
  67. 'TSG 1899 Hoffenheim'
  68. '1.FSV Mainz 05'
  69. 'VfB Stuttgart'
  70. 'Hertha BSC'
  71. 'Eintracht Frankfurt'
  72. 'FC Augsburg'
  73. '1.FC Köln'
  74. 'SV Werder Bremen'
  75. 'Hamburger SV'
  76. 'Hannover 96'
  77. 'FC Ingolstadt 04'
  78. 'SV Darmstadt 98'
  79. 'Paris Saint-Germain'
  80. 'AS Monaco'
  81. 'Olympique Lyon'
  82. 'Olympique Marseille'
  83. 'AS Saint-Étienne'
  84. 'Stade Rennais FC'
  85. 'FC Girondins Bordeaux'
  86. 'OGC Nice'
  87. 'LOSC Lille'
  88. 'FC Lorient'
  89. 'Montpellier HSC'
  90. 'Stade Reims'
  91. 'FC Toulouse'
  92. 'FC Nantes'
  93. 'SM Caen'
  94. 'EA Guingamp'
  95. 'Angers SCO'
  96. 'ESTAC Troyes'
  97. 'SC Bastia'
  98. 'GFC Ajaccio'
\n" ], "text/latex": [ "\\begin{enumerate*}\n", "\\item 'Chelsea FC'\n", "\\item 'Manchester City'\n", "\\item 'Arsenal FC'\n", "\\item 'Manchester United'\n", "\\item 'Liverpool FC'\n", "\\item 'Tottenham Hotspur'\n", "\\item 'Everton FC'\n", "\\item 'Southampton FC'\n", "\\item 'West Ham United'\n", "\\item 'Newcastle United'\n", "\\item 'Leicester City'\n", "\\item 'Stoke City'\n", "\\item 'Swansea City'\n", "\\item 'Watford FC'\n", "\\item 'Crystal Palace'\n", "\\item 'Aston Villa'\n", "\\item 'Norwich City'\n", "\\item 'West Bromwich Albion'\n", "\\item 'Sunderland AFC'\n", "\\item 'AFC Bournemouth'\n", "\\item 'FC Barcelona'\n", "\\item 'Real Madrid'\n", "\\item 'Atlético de Madrid'\n", "\\item 'Valencia CF'\n", "\\item 'Sevilla FC'\n", "\\item 'Athletic Bilbao'\n", "\\item 'Villarreal CF'\n", "\\item 'Real Sociedad'\n", "\\item 'Celta de Vigo'\n", "\\item 'Málaga CF'\n", "\\item 'Granada CF'\n", "\\item 'RCD Espanyol Barcelona'\n", "\\item 'Deportivo de La Coruña'\n", "\\item 'SD Eibar'\n", "\\item 'Real Betis Balompié'\n", "\\item 'Sporting Gijón'\n", "\\item 'Levante UD'\n", "\\item 'Rayo Vallecano'\n", "\\item 'Getafe CF'\n", "\\item 'UD Las Palmas'\n", "\\item 'Juventus FC'\n", "\\item 'AS Roma'\n", "\\item 'SSC Napoli'\n", "\\item 'Inter Milan'\n", "\\item 'AC Milan'\n", "\\item 'ACF Fiorentina'\n", "\\item 'SS Lazio'\n", "\\item 'UC Sampdoria'\n", "\\item 'Genoa CFC'\n", "\\item 'US Sassuolo'\n", "\\item 'Udinese Calcio'\n", "\\item 'Torino FC'\n", "\\item 'Atalanta BC'\n", "\\item 'Bologna FC 1909'\n", "\\item 'FC Empoli'\n", "\\item 'Palermo FC'\n", "\\item 'AC Carpi'\n", "\\item 'Chievo Verona'\n", "\\item 'Hellas Verona'\n", "\\item 'Frosinone Calcio'\n", "\\item 'Bayern Munich'\n", "\\item 'Borussia Dortmund'\n", "\\item 'VfL Wolfsburg'\n", "\\item 'Bayer 04 Leverkusen'\n", "\\item 'FC Schalke 04'\n", "\\item 'Borussia Mönchengladbach'\n", "\\item 'TSG 1899 Hoffenheim'\n", "\\item '1.FSV Mainz 05'\n", "\\item 'VfB Stuttgart'\n", "\\item 'Hertha BSC'\n", "\\item 'Eintracht Frankfurt'\n", "\\item 'FC Augsburg'\n", "\\item '1.FC Köln'\n", "\\item 'SV Werder Bremen'\n", "\\item 'Hamburger SV'\n", "\\item 'Hannover 96'\n", "\\item 'FC Ingolstadt 04'\n", "\\item 'SV Darmstadt 98'\n", "\\item 'Paris Saint-Germain'\n", "\\item 'AS Monaco'\n", "\\item 'Olympique Lyon'\n", "\\item 'Olympique Marseille'\n", "\\item 'AS Saint-Étienne'\n", "\\item 'Stade Rennais FC'\n", "\\item 'FC Girondins Bordeaux'\n", "\\item 'OGC Nice'\n", "\\item 'LOSC Lille'\n", "\\item 'FC Lorient'\n", "\\item 'Montpellier HSC'\n", "\\item 'Stade Reims'\n", "\\item 'FC Toulouse'\n", "\\item 'FC Nantes'\n", "\\item 'SM Caen'\n", "\\item 'EA Guingamp'\n", "\\item 'Angers SCO'\n", "\\item 'ESTAC Troyes'\n", "\\item 'SC Bastia'\n", "\\item 'GFC Ajaccio'\n", "\\end{enumerate*}\n" ], "text/markdown": [ "1. 'Chelsea FC'\n", "2. 'Manchester City'\n", "3. 'Arsenal FC'\n", "4. 'Manchester United'\n", "5. 'Liverpool FC'\n", "6. 'Tottenham Hotspur'\n", "7. 'Everton FC'\n", "8. 'Southampton FC'\n", "9. 'West Ham United'\n", "10. 'Newcastle United'\n", "11. 'Leicester City'\n", "12. 'Stoke City'\n", "13. 'Swansea City'\n", "14. 'Watford FC'\n", "15. 'Crystal Palace'\n", "16. 'Aston Villa'\n", "17. 'Norwich City'\n", "18. 'West Bromwich Albion'\n", "19. 'Sunderland AFC'\n", "20. 'AFC Bournemouth'\n", "21. 'FC Barcelona'\n", "22. 'Real Madrid'\n", "23. 'Atlético de Madrid'\n", "24. 'Valencia CF'\n", "25. 'Sevilla FC'\n", "26. 'Athletic Bilbao'\n", "27. 'Villarreal CF'\n", "28. 'Real Sociedad'\n", "29. 'Celta de Vigo'\n", "30. 'Málaga CF'\n", "31. 'Granada CF'\n", "32. 'RCD Espanyol Barcelona'\n", "33. 'Deportivo de La Coruña'\n", "34. 'SD Eibar'\n", "35. 'Real Betis Balompié'\n", "36. 'Sporting Gijón'\n", "37. 'Levante UD'\n", "38. 'Rayo Vallecano'\n", "39. 'Getafe CF'\n", "40. 'UD Las Palmas'\n", "41. 'Juventus FC'\n", "42. 'AS Roma'\n", "43. 'SSC Napoli'\n", "44. 'Inter Milan'\n", "45. 'AC Milan'\n", "46. 'ACF Fiorentina'\n", "47. 'SS Lazio'\n", "48. 'UC Sampdoria'\n", "49. 'Genoa CFC'\n", "50. 'US Sassuolo'\n", "51. 'Udinese Calcio'\n", "52. 'Torino FC'\n", "53. 'Atalanta BC'\n", "54. 'Bologna FC 1909'\n", "55. 'FC Empoli'\n", "56. 'Palermo FC'\n", "57. 'AC Carpi'\n", "58. 'Chievo Verona'\n", "59. 'Hellas Verona'\n", "60. 'Frosinone Calcio'\n", "61. 'Bayern Munich'\n", "62. 'Borussia Dortmund'\n", "63. 'VfL Wolfsburg'\n", "64. 'Bayer 04 Leverkusen'\n", "65. 'FC Schalke 04'\n", "66. 'Borussia Mönchengladbach'\n", "67. 'TSG 1899 Hoffenheim'\n", "68. '1.FSV Mainz 05'\n", "69. 'VfB Stuttgart'\n", "70. 'Hertha BSC'\n", "71. 'Eintracht Frankfurt'\n", "72. 'FC Augsburg'\n", "73. '1.FC Köln'\n", "74. 'SV Werder Bremen'\n", "75. 'Hamburger SV'\n", "76. 'Hannover 96'\n", "77. 'FC Ingolstadt 04'\n", "78. 'SV Darmstadt 98'\n", "79. 'Paris Saint-Germain'\n", "80. 'AS Monaco'\n", "81. 'Olympique Lyon'\n", "82. 'Olympique Marseille'\n", "83. 'AS Saint-Étienne'\n", "84. 'Stade Rennais FC'\n", "85. 'FC Girondins Bordeaux'\n", "86. 'OGC Nice'\n", "87. 'LOSC Lille'\n", "88. 'FC Lorient'\n", "89. 'Montpellier HSC'\n", "90. 'Stade Reims'\n", "91. 'FC Toulouse'\n", "92. 'FC Nantes'\n", "93. 'SM Caen'\n", "94. 'EA Guingamp'\n", "95. 'Angers SCO'\n", "96. 'ESTAC Troyes'\n", "97. 'SC Bastia'\n", "98. 'GFC Ajaccio'\n", "\n", "\n" ], "text/plain": [ " [1] \"Chelsea FC\" \"Manchester City\" \n", " [3] \"Arsenal FC\" \"Manchester United\" \n", " [5] \"Liverpool FC\" \"Tottenham Hotspur\" \n", " [7] \"Everton FC\" \"Southampton FC\" \n", " [9] \"West Ham United\" \"Newcastle United\" \n", "[11] \"Leicester City\" \"Stoke City\" \n", "[13] \"Swansea City\" \"Watford FC\" \n", "[15] \"Crystal Palace\" \"Aston Villa\" \n", "[17] \"Norwich City\" \"West Bromwich Albion\" \n", "[19] \"Sunderland AFC\" \"AFC Bournemouth\" \n", "[21] \"FC Barcelona\" \"Real Madrid\" \n", "[23] \"Atltico de Madrid\" \"Valencia CF\" \n", "[25] \"Sevilla FC\" \"Athletic Bilbao\" \n", "[27] \"Villarreal CF\" \"Real Sociedad\" \n", "[29] \"Celta de Vigo\" \"Mlaga CF\" \n", "[31] \"Granada CF\" \"RCD Espanyol Barcelona\" \n", "[33] \"Deportivo de La Corua\" \"SD Eibar\" \n", "[35] \"Real Betis Balompi\" \"Sporting Gijn\" \n", "[37] \"Levante UD\" \"Rayo Vallecano\" \n", "[39] \"Getafe CF\" \"UD Las Palmas\" \n", "[41] \"Juventus FC\" \"AS Roma\" \n", "[43] \"SSC Napoli\" \"Inter Milan\" \n", "[45] \"AC Milan\" \"ACF Fiorentina\" \n", "[47] \"SS Lazio\" \"UC Sampdoria\" \n", "[49] \"Genoa CFC\" \"US Sassuolo\" \n", "[51] \"Udinese Calcio\" \"Torino FC\" \n", "[53] \"Atalanta BC\" \"Bologna FC 1909\" \n", "[55] \"FC Empoli\" \"Palermo FC\" \n", "[57] \"AC Carpi\" \"Chievo Verona\" \n", "[59] \"Hellas Verona\" \"Frosinone Calcio\" \n", "[61] \"Bayern Munich\" \"Borussia Dortmund\" \n", "[63] \"VfL Wolfsburg\" \"Bayer 04 Leverkusen\" \n", "[65] \"FC Schalke 04\" \"Borussia Mnchengladbach\"\n", "[67] \"TSG 1899 Hoffenheim\" \"1.FSV Mainz 05\" \n", "[69] \"VfB Stuttgart\" \"Hertha BSC\" \n", "[71] \"Eintracht Frankfurt\" \"FC Augsburg\" \n", "[73] \"1.FC Kln\" \"SV Werder Bremen\" \n", "[75] \"Hamburger SV\" \"Hannover 96\" \n", "[77] \"FC Ingolstadt 04\" \"SV Darmstadt 98\" \n", "[79] \"Paris Saint-Germain\" \"AS Monaco\" \n", "[81] \"Olympique Lyon\" \"Olympique Marseille\" \n", "[83] \"AS Saint-tienne\" \"Stade Rennais FC\" \n", "[85] \"FC Girondins Bordeaux\" \"OGC Nice\" \n", "[87] \"LOSC Lille\" \"FC Lorient\" \n", "[89] \"Montpellier HSC\" \"Stade Reims\" \n", "[91] \"FC Toulouse\" \"FC Nantes\" \n", "[93] \"SM Caen\" \"EA Guingamp\" \n", "[95] \"Angers SCO\" \"ESTAC Troyes\" \n", "[97] \"SC Bastia\" \"GFC Ajaccio\" " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head_coach <- tm_team_staff_history(team_urls = teams_url, staff_role = \"Manager\")\n", "unique(head_coach$league)\n", "unique(head_coach$team)" ] }, { "cell_type": "markdown", "id": "553e0185", "metadata": {}, "source": [ "There is some missing information about country and league in the data. We will add this information manually." ] }, { "cell_type": "code", "execution_count": 15, "id": "9b978c62", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "
team_name
0
league
0
country
0
staff_role
0
staff_name
0
staff_url
0
staff_dob
0
staff_nationality
4
staff_nationality_secondary
4424
appointed
0
end_date
90
days_in_post
0
matches
0
wins
0
draws
0
losses
0
ppg
0
\n" ], "text/latex": [ "\\begin{description*}\n", "\\item[team\\textbackslash{}\\_name] 0\n", "\\item[league] 0\n", "\\item[country] 0\n", "\\item[staff\\textbackslash{}\\_role] 0\n", "\\item[staff\\textbackslash{}\\_name] 0\n", "\\item[staff\\textbackslash{}\\_url] 0\n", "\\item[staff\\textbackslash{}\\_dob] 0\n", "\\item[staff\\textbackslash{}\\_nationality] 4\n", "\\item[staff\\textbackslash{}\\_nationality\\textbackslash{}\\_secondary] 4424\n", "\\item[appointed] 0\n", "\\item[end\\textbackslash{}\\_date] 90\n", "\\item[days\\textbackslash{}\\_in\\textbackslash{}\\_post] 0\n", "\\item[matches] 0\n", "\\item[wins] 0\n", "\\item[draws] 0\n", "\\item[losses] 0\n", "\\item[ppg] 0\n", "\\end{description*}\n" ], "text/markdown": [ "team_name\n", ": 0league\n", ": 0country\n", ": 0staff_role\n", ": 0staff_name\n", ": 0staff_url\n", ": 0staff_dob\n", ": 0staff_nationality\n", ": 4staff_nationality_secondary\n", ": 4424appointed\n", ": 0end_date\n", ": 90days_in_post\n", ": 0matches\n", ": 0wins\n", ": 0draws\n", ": 0losses\n", ": 0ppg\n", ": 0\n", "\n" ], "text/plain": [ " team_name league \n", " 0 0 \n", " country staff_role \n", " 0 0 \n", " staff_name staff_url \n", " 0 0 \n", " staff_dob staff_nationality \n", " 0 4 \n", "staff_nationality_secondary appointed \n", " 4424 0 \n", " end_date days_in_post \n", " 90 0 \n", " matches wins \n", " 0 0 \n", " draws losses \n", " 0 0 \n", " ppg \n", " 0 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [], "text/latex": [], "text/markdown": [], "text/plain": [ "character(0)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sapply(head_coach, function(x) sum(is.na(x)))\n", "# Show unique teams with missing league and or country\n", "unique(head_coach$team[is.na(head_coach$league) | is.na(head_coach$country)])" ] }, { "cell_type": "code", "execution_count": 16, "id": "48ade17d", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# Fix league and country for 'Chievo Verona' and 'GFC Ajaccio'\n", "head_coach$league[head_coach$team == 'Chievo Verona'] <- 'Serie A'\n", "head_coach$country[head_coach$team == 'Chievo Verona'] <- 'Italy'\n", "head_coach$league[head_coach$team == 'GFC Ajaccio'] <- 'Ligue 2'\n", "head_coach$country[head_coach$team == 'GFC Ajaccio'] <- 'France'" ] }, { "cell_type": "markdown", "id": "23b0db35", "metadata": {}, "source": [ "Filter leagues that are not First Division Leagues" ] }, { "cell_type": "code", "execution_count": 17, "id": "5c06a46b", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/html": [ "TRUE" ], "text/latex": [ "TRUE" ], "text/markdown": [ "TRUE" ], "text/plain": [ "[1] TRUE" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 5 x 17
team_nameleaguecountrystaff_rolestaff_namestaff_urlstaff_dobstaff_nationalitystaff_nationality_secondaryappointedend_datedays_in_postmatcheswinsdrawslossesppg
<chr><chr><chr><chr><chr><chr><chr><chr><chr><date><date><dbl><dbl><dbl><dbl><dbl><dbl>
1Chelsea FCPremier LeagueEnglandManagerMauricio Pochettinohttps://www.transfermarkt.com/mauricio-pochettino/profil/trainer/9044Mar 2, 1972 ArgentinaSpain2023-07-01NA296 4422 9131.70
2Chelsea FCPremier LeagueEnglandManagerGraham Potter https://www.transfermarkt.com/graham-potter/profil/trainer/23954 May 20, 1975England NA 2022-09-082023-04-02206 3112 8111.42
3Chelsea FCPremier LeagueEnglandManagerThomas Tuchel https://www.transfermarkt.com/thomas-tuchel/profil/trainer/7471 Aug 29, 1973Germany NA 2021-01-262022-09-075891006319182.08
4Chelsea FCPremier LeagueEnglandManagerFrank Lampard https://www.transfermarkt.com/frank-lampard/profil/trainer/60805 Jun 20, 1978England NA 2019-07-042021-01-25571 844415251.75
5Chelsea FCPremier LeagueEnglandManagerMaurizio Sarri https://www.transfermarkt.com/maurizio-sarri/profil/trainer/10073 Jan 10, 1959Italy NA 2018-07-142019-06-30351 634011122.08
\n" ], "text/latex": [ "A data.frame: 5 x 17\n", "\\begin{tabular}{r|lllllllllllllllll}\n", " & team\\_name & league & country & staff\\_role & staff\\_name & staff\\_url & staff\\_dob & staff\\_nationality & staff\\_nationality\\_secondary & appointed & end\\_date & days\\_in\\_post & matches & wins & draws & losses & ppg\\\\\n", " & & & & & & & & & & & & & & & & & \\\\\n", "\\hline\n", "\t1 & Chelsea FC & Premier League & England & Manager & Mauricio Pochettino & https://www.transfermarkt.com/mauricio-pochettino/profil/trainer/9044 & Mar 2, 1972 & Argentina & Spain & 2023-07-01 & NA & 296 & 44 & 22 & 9 & 13 & 1.70\\\\\n", "\t2 & Chelsea FC & Premier League & England & Manager & Graham Potter & https://www.transfermarkt.com/graham-potter/profil/trainer/23954 & May 20, 1975 & England & NA & 2022-09-08 & 2023-04-02 & 206 & 31 & 12 & 8 & 11 & 1.42\\\\\n", "\t3 & Chelsea FC & Premier League & England & Manager & Thomas Tuchel & https://www.transfermarkt.com/thomas-tuchel/profil/trainer/7471 & Aug 29, 1973 & Germany & NA & 2021-01-26 & 2022-09-07 & 589 & 100 & 63 & 19 & 18 & 2.08\\\\\n", "\t4 & Chelsea FC & Premier League & England & Manager & Frank Lampard & https://www.transfermarkt.com/frank-lampard/profil/trainer/60805 & Jun 20, 1978 & England & NA & 2019-07-04 & 2021-01-25 & 571 & 84 & 44 & 15 & 25 & 1.75\\\\\n", "\t5 & Chelsea FC & Premier League & England & Manager & Maurizio Sarri & https://www.transfermarkt.com/maurizio-sarri/profil/trainer/10073 & Jan 10, 1959 & Italy & NA & 2018-07-14 & 2019-06-30 & 351 & 63 & 40 & 11 & 12 & 2.08\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 5 x 17\n", "\n", "| | team_name <chr> | league <chr> | country <chr> | staff_role <chr> | staff_name <chr> | staff_url <chr> | staff_dob <chr> | staff_nationality <chr> | staff_nationality_secondary <chr> | appointed <date> | end_date <date> | days_in_post <dbl> | matches <dbl> | wins <dbl> | draws <dbl> | losses <dbl> | ppg <dbl> |\n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| 1 | Chelsea FC | Premier League | England | Manager | Mauricio Pochettino | https://www.transfermarkt.com/mauricio-pochettino/profil/trainer/9044 | Mar 2, 1972 | Argentina | Spain | 2023-07-01 | NA | 296 | 44 | 22 | 9 | 13 | 1.70 |\n", "| 2 | Chelsea FC | Premier League | England | Manager | Graham Potter | https://www.transfermarkt.com/graham-potter/profil/trainer/23954 | May 20, 1975 | England | NA | 2022-09-08 | 2023-04-02 | 206 | 31 | 12 | 8 | 11 | 1.42 |\n", "| 3 | Chelsea FC | Premier League | England | Manager | Thomas Tuchel | https://www.transfermarkt.com/thomas-tuchel/profil/trainer/7471 | Aug 29, 1973 | Germany | NA | 2021-01-26 | 2022-09-07 | 589 | 100 | 63 | 19 | 18 | 2.08 |\n", "| 4 | Chelsea FC | Premier League | England | Manager | Frank Lampard | https://www.transfermarkt.com/frank-lampard/profil/trainer/60805 | Jun 20, 1978 | England | NA | 2019-07-04 | 2021-01-25 | 571 | 84 | 44 | 15 | 25 | 1.75 |\n", "| 5 | Chelsea FC | Premier League | England | Manager | Maurizio Sarri | https://www.transfermarkt.com/maurizio-sarri/profil/trainer/10073 | Jan 10, 1959 | Italy | NA | 2018-07-14 | 2019-06-30 | 351 | 63 | 40 | 11 | 12 | 2.08 |\n", "\n" ], "text/plain": [ " team_name league country staff_role staff_name \n", "1 Chelsea FC Premier League England Manager Mauricio Pochettino\n", "2 Chelsea FC Premier League England Manager Graham Potter \n", "3 Chelsea FC Premier League England Manager Thomas Tuchel \n", "4 Chelsea FC Premier League England Manager Frank Lampard \n", "5 Chelsea FC Premier League England Manager Maurizio Sarri \n", " staff_url \n", "1 https://www.transfermarkt.com/mauricio-pochettino/profil/trainer/9044\n", "2 https://www.transfermarkt.com/graham-potter/profil/trainer/23954 \n", "3 https://www.transfermarkt.com/thomas-tuchel/profil/trainer/7471 \n", "4 https://www.transfermarkt.com/frank-lampard/profil/trainer/60805 \n", "5 https://www.transfermarkt.com/maurizio-sarri/profil/trainer/10073 \n", " staff_dob staff_nationality staff_nationality_secondary appointed \n", "1 Mar 2, 1972 Argentina Spain 2023-07-01\n", "2 May 20, 1975 England NA 2022-09-08\n", "3 Aug 29, 1973 Germany NA 2021-01-26\n", "4 Jun 20, 1978 England NA 2019-07-04\n", "5 Jan 10, 1959 Italy NA 2018-07-14\n", " end_date days_in_post matches wins draws losses ppg \n", "1 296 44 22 9 13 1.70\n", "2 2023-04-02 206 31 12 8 11 1.42\n", "3 2022-09-07 589 100 63 19 18 2.08\n", "4 2021-01-25 571 84 44 15 25 1.75\n", "5 2019-06-30 351 63 40 11 12 2.08" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Filter teams that are not First Division teams\n", "# first_division_teams <- c(\n", "# 'Premier League', 'LaLiga', 'Serie A', 'Bundesliga', 'Ligue 1', \n", "# 'Liga Portugal', 'Scottish Premiership', 'PKO BP Ekstraklasa', 'Super League 1', \n", "# 'Super League', 'Eredivisie', 'Jupiler Pro League')\n", "\n", "first_division_teams <- c('Premier League', 'LaLiga', 'Serie A', 'Bundesliga', 'Ligue 1')\n", "# Ensure the every first_division_teams is in the head_coach$league\n", "all(first_division_teams %in% head_coach$league)\n", "# Filter the head_coach data\n", "head_coach <- head_coach[head_coach$league %in% first_division_teams, ]\n", "head(head_coach, 5)" ] }, { "cell_type": "code", "execution_count": 18, "id": "3bd70b01", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [ { "data": { "text/plain": [ " Team League Country HeadCoach \n", " Length:3532 Length:3532 Length:3532 Length:3532 \n", " Class :character Class :character Class :character Class :character \n", " Mode :character Mode :character Mode :character Mode :character \n", " \n", " \n", " \n", " \n", " Appointed EndDate Tenure Matches \n", " Min. :1886-06-26 Min. :1893-08-01 Min. : -242.0 Min. : 0.00 \n", " 1st Qu.:1961-11-02 1st Qu.:1963-06-30 1st Qu.: 186.0 1st Qu.: 10.00 \n", " Median :1987-07-01 Median :1988-03-06 Median : 364.0 Median : 29.00 \n", " Mean :1982-05-15 Mean :1983-04-16 Mean : 608.2 Mean : 51.59 \n", " 3rd Qu.:2004-12-29 3rd Qu.:2005-06-30 3rd Qu.: 730.0 3rd Qu.: 67.00 \n", " Max. :2024-04-23 Max. :2024-06-30 Max. :14613.0 Max. :1490.00 \n", " NA's :64 \n", " Wins Draws Losses \n", " Min. : 0.00 Min. : 0.00 Min. : 0.00 \n", " 1st Qu.: 2.00 1st Qu.: 2.00 1st Qu.: 4.00 \n", " Median : 10.00 Median : 7.00 Median : 10.00 \n", " Mean : 22.53 Mean : 13.01 Mean : 16.05 \n", " 3rd Qu.: 28.00 3rd Qu.: 17.00 3rd Qu.: 21.00 \n", " Max. :895.00 Max. :323.00 Max. :272.00 \n", " " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "columns_to_keep <- c('team_name', 'league', 'country', 'staff_name', 'appointed', 'end_date', 'days_in_post', 'matches', 'wins', 'draws', 'losses')\n", "head_coach <- head_coach[, columns_to_keep]\n", "\n", "# Rename columns\n", "colnames(head_coach) <- c('Team', 'League', 'Country', 'HeadCoach', 'Appointed', 'EndDate', 'Tenure', 'Matches', 'Wins', 'Draws', 'Losses')\n", "\n", "summary(head_coach)" ] }, { "cell_type": "code", "execution_count": 19, "id": "6f285bc5", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [ "# Saving the data\n", "write_csv(head_coach, \"data/extracted_head_coach.csv\")" ] }, { "cell_type": "code", "execution_count": null, "id": "1cd004b5", "metadata": { "vscode": { "languageId": "r" } }, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "cell_metadata_filter": "eval,-all", "main_language": "R", "notebook_metadata_filter": "-all" }, "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.3.2" } }, "nbformat": 4, "nbformat_minor": 5 }